Data exploration

sn <- read.csv("data/clean/cleaned_combined_minitrial_data.csv")
sn$Recipe <- as.factor(case_when( 
    sn$Recipe == "FWS12"  ~  "FWS1:S2",
    sn$Recipe == "FWS13" ~ "FWS1:S3",
    sn$Recipe == "FWS31" ~ "FWS3:S1" ))

Note that the conditions were set to have 3 different loading rates, which we will treat as categorical because while they are numerical values of .5, 2, or 3.5, these are not observed values. There for we set them to be factors (which allows for categories to be ranked by a inheirent value).

Next, we can do some explortatory analysis:

sn$OL <- as.factor(sn$OL)
ggscatmat(sn %>% 
            select(-FOG, -BW, -DCS, -init_Coliforms, -init_Ecoli, -init_Enterococci), color="Recipe", alpha = .3) + theme(legend.position = "bottom")
## Warning in ggscatmat(sn %>% select(-FOG, -BW, -DCS, -init_Coliforms, -
## init_Ecoli, : Factor variables are omitted in plot

#ggpairs(sn, aes(color=Recipe, alpha=.5))

Initial observations:

Lets replot without some of the highly correlated ones that

ggscatmat(sn %>% 
            select(-TS, -Coliforms ) %>%
            select(-FOG, -BW, -DCS, -init_Coliforms, -init_Ecoli, -init_Enterococci), color="Recipe", alpha = .3) + theme(legend.position = "bottom")
## Warning in ggscatmat(sn %>% select(-TS, -Coliforms) %>% select(-FOG, -BW, :
## Factor variables are omitted in plot

Lets look at the indicators

Enterococci

It looks like there is a big effect of temperature: Enterococci die-off only occurs at 55, and there may be a small degree of die-off at 37 for FWS12. Keeping inmind that 55 degree operation is rarely cost effective, lets look without the 55 degree condition.

It appears that nothing dirastic is happening to the Enterococci regardless of recipe for 19 and 37 degrees.

We can further split this out by the proportion of the different componenets in the feedstock

ggplot(sn, aes(x=Time, y=Enterococci, color=as.factor(DCS))) + 
  geom_point() + 
  labs(caption=fn("Effect of OL vs DCS"), 
       subtitle="No effect due to proportion of DCS", color="") +
  facet_wrap(~OL+Temp)  + 
  geom_smooth(method = lm)

ggplot(sn, aes(x=Temp, y=Enterococci, color=as.factor(BW))) + 
  labs(caption=fn("Effect of Temp"), color="BW") +
  geom_point() + geom_smooth(method = lm)

ggplot(sn, aes(x=Time, y=Enterococci, color=as.factor(Temp))) +
  geom_point() + 
  labs(subtitle="At 19 and 37, presence of BW increases Enterococci survival",
       caption=fn("Effect of BW"), color="") +
  facet_wrap(~BW)  + geom_smooth(method = lm)

Those rates seem to indicate 2 things:

  • that the higher proportion of restaurant food waste, the better enterococci survive
  • regardless of the proportion of BW, increased temperature inhibits Enterococci

E. coli

ggplot(sn, aes(x=Time, y=Ecoli, color=as.factor(Temp))) +
  geom_point()  + geom_smooth(method = lm) + 
  labs(caption=fn("Effect of Temperature"), color="")

ggplot(sn, aes(x=Time, y=Ecoli, color=as.factor(Temp))) +
  geom_point() + 
  facet_wrap(~OL+Recipe)  +
  geom_smooth(method = lm) + 
  labs(caption=fn("Effect of OL and Recipe"), color="")

E coli counts seem to depend largely on Temp rather than OL or Recipe.

This may indicate that while die-off occurs at cold temperature, higher temperatue increases the die-off rate.

While dieoff occurs at 37 and 55, it also appears to occur at 19 with FWS13.

In All cases at low temperatures, lower OL yields better dieoff.

Coliforms

Coliforms should be have the same as E. coli (as we saw their values being highly correlated), but lets observe:

ggplot(sn, aes(x=Time, y=Coliforms, color=as.factor(Temp))) +
  geom_point()  + geom_smooth(method = lm) + 
  labs(caption=fn("Effect of Temperature"), color="")

Coliform counts too seem to depend largely on Temp rather than OL or Recipe.

Except for FWS31, lower OL inproves die-off.

Ammonia

ggplot(sn, aes(x=Time, y=Ammonia, color=as.factor(Temp))) +
  geom_point()  + geom_smooth(method = lm) + 
  facet_wrap(~Recipe )  +
  labs(caption=fn("Effect of Temperature and Recipe"), color="")

Across all recipes and temperatures, Ammonia decreases as a function of time.

Methane

Methane is the only thing leaving the bioreactors (as biogas), and therefore was measured daily until detructive sampling at 3, 6, or 9 days. We have aggregated the data in the original excel sheet.

ggplot(sn, aes(x=Time, y=Methane, color=as.factor(Recipe))) +
  geom_point()  + geom_smooth(method = lm) + 
  facet_wrap(~Temp )  +
  labs(caption=fn("Effect of Temperature and Recipe"), color="")

ggplot(sn, aes(x=Time, y=Methane, color=as.factor(Temp))) +
  geom_point()  + geom_smooth(method = lm) + 
  facet_wrap(~Recipe )  +
  labs(caption=fn("Effect of Recipe and Temperature"), color="")

ggplot(sn, aes(x=Time, y=Methane, color=as.factor(Temp))) +
  geom_point()  + geom_smooth(method = lm) + 
  facet_wrap(~Recipe + OL )  +
  labs(caption=fn("Effect of Temperature and OL"), color="")

It looks like the rate of poduction is highest at 37, and that at the optimal feed stocks with a high OL yield high prroduction, expecially at 37.

pH

ggplot(sn, aes(x=Time, y=pH, color=as.factor(Recipe))) +
  geom_point()  + geom_smooth(method = lm) + 
  facet_wrap(~Temp )  +
  labs(caption=fn("Effect of Temperature and Recipe"), color="")

The range of pH is rerlatively tight, but it appears to drop with time at the temperature extremes, staying the most stable at 37

Plots for paper:

fn = local({
  # from https://github.com/yihui/knitr-examples/blob/master/070-caption-num.Rmd 
  i = 0
  function(x) {
    i <<- i + 1
    paste('Figure ', i, ': ', x, sep = '')
  }
})
sntall <- sn %>% gather(key = "FIB", value = "Count", Ecoli, Coliforms, Enterococci) %>%
   gather(key = "initbug", value = "init_count", init_Ecoli, init_Coliforms, init_Enterococci) %>% 
  filter(FIB == gsub("init_", "", initbug)) 
sntall$initbug <- gsub("init_", "", sntall$initbug)
#sntall$init_count <- ifelse(sntall$FIB == sntall$initbug, sntall$init_count, 10)
#sntall$rec_color <- c("darkred", "purple", "brown")[as.factor(sntall$Recipe)]
ggplot(sntall, aes(x=Time, y=Count, color=as.factor(Recipe))) +
  geom_hline(aes(yintercept=init_count, color=Recipe),linetype="dashed") +
  geom_hline(yintercept = 3,linetype="solid", color="grey80") +
  geom_point() + facet_wrap(~FIB+Temp)  + geom_smooth(method = lm) + 
  scale_x_continuous(breaks=c(0,3,6,9)) + 
  scale_y_continuous(breaks= seq(2,8, 1)) +
  coord_cartesian(ylim =  c(1.5, 8)) +
  annotation_logticks(base = 10, sides = "l",scaled=TRUE, alpha=.4) +
  labs(
    x="Day",
    y=expression(paste("Log CFU/g")),
    #subtitle="Dieoff only occurs at 55", 
    #caption=fn("Effect of Temperature"), 
    caption=fn("Effect of Temperature and Recipe"),
    color="")

ggplot(sn, aes(x=Time, y=Methane, color=as.factor(Recipe)))+#, shape=as.factor(OL))) +
  geom_point()  + geom_smooth(method = lm) + 
  facet_wrap(~Temp )  +
  labs(caption=fn("Effect of Temperature and Recipe"), color="")

Here we have a function to make the removal plots:

make_delta_plot <- function(
  sntall, 
  grps=c("FIB","Time", "Recipe", "Temp", "OL"),
  shapes=F,
  size=T,
  times=c(6,9), OLs=c(.5, 2, 3.5),
  error="sem")
{
  newdf2 <- sntall %>% filter(OL %in% OLs) %>%   droplevels() 
  grpvars <- syms(grps)
  newdf2$removal <- - newdf2$init_count + newdf2$Count  + 3
  newdf <- newdf2 %>% group_by(!!!grpvars) %>%
    filter(Time %in% times) %>%
    #group_by(FIB, Time, Recipe, Temp) %>% 
    summarize(meen = mean(removal),
            medin = median(removal),
            # confirm groupings allow for proper error propogation
            propsdev = sqrt(sd(init_count)^2 + sd(Count)^2),
            sdev = sd(removal),
            sem = sd(removal)/sqrt(n()),
            sum = sum(removal),
            n = n(),
            q1 = quantile(removal, 0.25),
            q3 = quantile(removal, 0.75))
  if (error == "sem"){
    newdf$error <- newdf$sem
  } else if (error == "sd"){
    newdf$error <- newdf$sdev
  } else{
    stop("Only options for error are sd and sem!")
  }
  if (!shapes || !"OL" %in% grps){
    newdf$thisshape <- "Combined OLs"
    shape_labels <- c("Combined OLs")
    shape_breaks <- c(21)
    shape_guide <- "none"
  } else {
    newdf$thisshape <- c(21, 23, 24, 25)[as.numeric(as.factor(newdf$OL))]
    shape_labels <- unique(c("21", "23", "24", "25")[as.numeric(as.factor(newdf$OL))])
    shape_breaks <- unique(c(21, 23, 24, 25)[as.numeric(as.factor(newdf$OL))])
    newdf$thisshape <- newdf$OL
    shape_guide <- guide_legend(override.aes = list(size = 2))
  }
  if (size){
    newdf$nsize = newdf$n / 4
    newdf$nlabel = newdf$n 
    size_guide = guide_legend()
  } else{
    newdf$nsize = 2
    newdf$nlabel = "nope" 
    size_guide = "none"
  }
  colors <- viridisLite::viridis(n=3)
  if ("OL" %in% grps){
    thisalpha <- .8
  } else{
    thisalpha <- 1
  }
  newdf$tempcol = case_when( 
    newdf$Temp == 19 ~ colors[1],
    newdf$Temp == 37 ~ colors[2],
    newdf$Temp == 55 ~ colors[3])
  # get the areas sorted -- genom area doesnt extend, and ends  up looking lame.
  ids <- factor(c("initgood", "initbad", "good", "bad"))
  values <- data.frame(
    id = ids,
    value = c("#044702","#470202", "green", "red")
  )
  positions <- data.frame(
    id = rep(ids, each = 4),
    x = c(c(-1.5, -1.5, 1, 1), c(-1.5, -1.5, 1, 1), c(1, 1, 10, 10), c(10, 10, 1, 1))/3,
    y = c(c(-10, 0, 0, -10), c(0, 10, 10, 0), c(-10, 0, 0, -10), c(0, 10, 10, 0)),
    alpha=c(rep(.15, 8), rep(.09, 8))
  )
  # Currently we need to manually merge the two together
  datapoly <- merge(values, positions, by = c("id"))
  (p <- ggplot() +
      geom_polygon(data=datapoly,  
                   aes(y=y, x=x, fill = value, group = id, alpha=alpha)) + 
      scale_fill_identity() +
      scale_alpha_identity(guide=F)  +
      theme(plot.caption = element_text(size=8),
        #    panel.grid = element_blank(),
        panel.grid.minor  = element_blank(),
        panel.grid.major.y =  element_blank(),
        panel.grid.major.x =  element_line(color="gray95"),

      ) +
      geom_hline(yintercept = 0,linetype="solid", color="grey70", size=.5) +
      #scale_x_continuous(breaks=rev(c(0,3,6,9)), ) + 
      geom_errorbar(
        data=newdf,
        aes(
          x=Time/3,ymin=meen-error, ymax=meen+error, 
          group=interaction(!!!syms(grps[!grps %in% c("Time", "FIB")])), 
          #color=tempcol
        ),
        alpha=0.7, 
        width=.3,
        color="black",
        position=position_dodge(width =.5), 
      ) +
      geom_point(
        data=newdf,
        aes(
          x=Time/3, y=meen,  
          shape=thisshape,
          size=nsize,
          group=interaction(!!!syms(grps[!grps %in% c("Time", "FIB")])), fill=tempcol, 
        ),
        alpha=thisalpha, 
        #size=2,
        color="black",
        position=position_dodge(width =.5), 
      ) + 
      facet_grid(Recipe ~ FIB, scales="free_y") + 
      # scale_x_reverse(breaks=c(0,3,6,9)) + 
      scale_x_reverse(breaks=times/3, labels=as.character(times)) + 
      scale_y_continuous(breaks=c( -2, 0, 2, 4)) + 
#      scale_shape_identity() +
      scale_shape_manual(values = c(21, 23, 24, 25), guide="legend") + 
      scale_size_identity("", 
                          labels=unique(newdf$nlabel),
                          breaks=unique(newdf$nsize),
                          guide="legend") + 
      # scale_shape_identity("",
      #                      #labels = shape_labels,
      #                      #breaks = "21", #shape_breaks,
      #                      guide = "legend") +

      scale_color_identity("", 
                           labels = unique(newdf$Temp), 
                           breaks = unique(newdf$tempcol),
                           guide = "legend") +
    guides(color = guide_legend(override.aes = list(size = 2)),
           shape = shape_guide,
           size = size_guide) +
      labs(x="Days", y=expression(Delta~Log(CFU)),  color="Temp", shape="", 
           title="Meeting EU pathogen removal regulations", subtitle="EU regultations require FIB counts to be less than 3000 CFUs.\nGreen/red zones indicates sufficient/insufficient pathogen removal respectively",
           caption=paste("Groups:", paste(grps, collapse= ", "), "\nOLs:", paste(OLs, collapse= ", "), "\nn>=4")) +
      #coord_cartesian() +
      coord_flip(xlim=c(min(times/3) -.5, max(times/3) +.5), 
                 ylim=c(min(newdf$meen) -.5, max(newdf$meen) + .5)) 
  )
  return(p)
}
(p1 <- make_delta_plot(sntall, grps=c("FIB","Time", "Recipe", "Temp"), times=c(3, 6, 9), OLs=c(.5, 2, 3.5), shapes=T, size=F, error="sem"))

ggsave(p1, filename = "./removalplot.png", device = "png", width = 7, height = 6)